Revisiting Embedding Features for Simple Semi-supervised Learning
نویسندگان
چکیده
Recent work has shown success in using continuous word embeddings learned from unlabeled data as features to improve supervised NLP systems, which is regarded as a simple semi-supervised learning mechanism. However, fundamental problems on effectively incorporating the word embedding features within the framework of linear models remain. In this study, we investigate and analyze three different approaches, including a new proposed distributional prototype approach, for utilizing the embedding features. The presented approaches can be integrated into most of the classical linear models in NLP. Experiments on the task of named entity recognition show that each of the proposed approaches can better utilize the word embedding features, among which the distributional prototype approach performs the best. Moreover, the combination of the approaches provides additive improvements, outperforming the dense and continuous embedding features by nearly 2 points of F1 score.
منابع مشابه
Revisiting Semi-Supervised Learning with Graph Embeddings
We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embedding...
متن کاملCompound Embedding Features for Semi-supervised Learning
There has been a recent trend in discriminative methods of NLP to use representations of lexical items learned from unlabeled data as features, in order to overcome the problem of data sparsity. In this paper, we investigated the usage of word representations learned by neural language models, i.e. word embeddings. We built compound features of continuous word embeddings based on clustering to ...
متن کاملSemi-Supervised Dimensionality Reduction of Hyperspectral Image Based on Sparse Multi-Manifold Learning
In this paper, we proposed a new semi-supervised multi-manifold learning method, called semisupervised sparse multi-manifold embedding (S3MME), for dimensionality reduction of hyperspectral image data. S3MME exploits both the labeled and unlabeled data to adaptively find neighbors of each sample from the same manifold by using an optimization program based on sparse representation, and naturall...
متن کاملSemi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding
This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization. Unlike the previous approaches that rely on word embeddings, our method learns embeddings of small text regions from unlabeled data for integration into a supervised CNN. The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which...
متن کاملSemi-Supervised Learning with Multi-View Embedding: Theory and Application with Convolutional Neural Networks
This paper presents a theoretical analysis of multi-view embedding – feature embedding that can be learned from unlabeled data through the task of predicting one view from another. We prove its usefulness in supervised learning under certain conditions. The result explains the effectiveness of some existing methods such as word embedding. Based on this theory, we propose a new semi-supervised l...
متن کامل